July 06, 2018
You might be interested in this talk if you are:
This might not be the talk for you if:
# The meat of the presentation
for problem in conrete_problems_paper:
problem.describe()
gridworld(problem).show()
problem.describe_solutions()
In this presentation, we will be considering algorithms that learn via reinforcement. In the reinforcement learning framework, you have:
In reinforcement learning, the agent receives a reward from the environment as to its incremental progress each time it takes an action.
| Example | Agent | Environment | Action | Reward |
|---|---|---|---|---|
| Chess | Player | Board position | Move a piece | Win or Lose |
| Paper clip collector | Self | The world | Send packets via the internet | Number of paperclips |
| Heat pump | ||||
| Investment | ||||
| … | … | … | … | … |
| Example | Agent | Environment | Action | Reward |
|---|---|---|---|---|
| Chess | Player | Board position | Move a piece | Win or Lose |
| Paper clip collector | Self | The world | Send packets via the internet | Number of paperclips |
| Heat pump | Self | Sensor data | Adjust heater | Distance from Ideal Temperature |
| Investment | Investor | Stock Market | Buy or sell stocks | Profit |
| … | … | … | … | … |
A Reinforcement Learning Agent uses two tools to make decisions:
Given any state of the environment:
The goal of Reinforcement Learning is to take good actions. This amounts to finding an optimal policy.
Concrete Problems in AI Safety is a 2016 paper by researchers from Google Brain, OpenAI, Stanford and UC Berkeley.
The paper outlines 5 ‘tractible’ problems which, if solved would represent tangible progress towards safer AI systems.
Concrete Problems in AI Safety is a 2016 paper by researchers from Google Brain, OpenAI, Stanford and UC Berkeley.
The paper outlines 5 ‘tractible’ problems which, if solved would represent tangible progress towards safer AI systems.
We will also be using the 2017 paper AI Safety Gridworlds from DeepMind to provide some context for discussion.
A gridworld is a simplified environment which an Reinforcement Learning agent can interact with. They’re handy for testing algorithms.
The gridworlds in this presentation come from the AI Safety Gridworlds paper, and have been open sourced at: https://github.com/deepmind/ai-safety-gridworlds
In the process of completing its objective, it will almost always be necessary for an agent to change its environment.
“As you will no doubt be aware, the plans for development of the outlying regions of the Galaxy require the building of a hyperspatial express route through your star system, and regrettably your planet is one of those scheduled for demolition. The process will take slightly less than two of your Earth minutes. Thank you.”
Avoiding negative side effects is about incentivising Reinforcement Learning agents to minimise unnecessary impact on the environment, while still achieving their goal.
We want the agent to reach the goal, without disturbing the box.
We want the agent to reach the goal, without disturbing the box.
How do you think we could approach this problem?
We want to avoid side effects because the environment is already pretty good according to our preferences.
If you know which negative side effects to care about in advance, you can directly penalise your agent for triggering them.
But in the real world, there are so many that you don’t want to name them all.
The Concrete Problems in AI Safety paper also suggests some possible solutions.
This is not just a problem with AI systems.
“When a measure becomes a target, it ceases to be a good measure”
How could we get an agent to ignore the bucket?
There are strong ‘no free lunch theorems’ which suggest a general solution to reward hacking is impossible.
This is because the agent has no way of knowing which rewards are ‘the real deal’ in the general case.
One problem is that the agent can’t check if a state is giving a hacked reward without being in that state.
We can teach the agent by example, and hopefully this can generalise to better cover reward hacks.
How do you learn to do something that might be dangerous? What if you can’t afford to make mistakes on the way?
If I want to learn about tigers, should I buy a tiger, or buy a book about tigers?
Move the agent to the goal without going in the water.
Suppose we are training an agent to clean the office.
We want it to care about cleaning the whole office, even if we don’t take the time to check everywhere when we evaluate its progress.
Avoid the punishment which only triggers if the supervisor is present, whether the supervisor is present or not.
Scalable oversight could be considered a missing value problem.
We want the agent to infer what the Supervisor would offer as a reward if it were present.
Because of this, we can run into the problem of sparse rewards.
Building AI systems that can learn from an environment which provides sparse rewards only is an ongoing research problem.
Often, the environment that an agent is trained in doesn’t look much like the real world.
When interacting with the environment, an agent can experience distributional shift when the state of the environment is somehow fundamentally different from its previous experience.
In these situations, we want our agent to either make a good inference about what to do, or to fail in a safe manner.
Reach the goal while avoiding the lava. The agent is trained in the gridworld on the left, and evaluated in the gridworld on the right.
These days there are agents which learn using a bayesian method called variational inference.
The upshot of this is that they can assign a confidence level to the decisions they make, and so we can make them halt safely if their environment doesn’t match their prior experience.
At the same time, we would also like the agent to be robust to changes in the environment, and infer the correct action in a new environment instead of just giving up.
A lot of the main points in this presentation amount to widely understood pitfalls of statistical modelling.
Hopefully there has been some value in a principled approach, in particular:
I am keen to collaborate on gridworlds, or other interesting machine learning problems, feel free to chat after this.
If you’d like to discuss possible Data Science projects, check out our website https://dragonfly.co.nz and get in touch.
If you found this interesting, definitely check out Robert Miles’ Youtube Channel
If you want to learn about Reinforcement Learning, I recommend this lecture series from David Silver from DeepMind
Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. Concrete problems in AI safety. arXiv preprint arXiv:1606.06565, 2016.
J. Leike, M. Martic, V. Krakovna, P. Ortega, T. Everitt, A. Lefrancq, L. Orseau, and S. Legg. Ai safety gridworlds. arXiv preprint arXiv:1711.09883, 2017
Tom Everitt, Victoria Krakovna, Laurent Orseau, Marcus Hutter, and Shane Legg. Reinforcement learning with corrupted reward signal. In International Joint Conference on Artificial Intelligence, 2017.
If you’re curious about bias in machine learned representations of language:
Caliskan, Aylin, Joanna J. Bryson, and Arvind Narayanan. “Semantics derived automatically from language corpora contain human-like biases.” Science 356.6334 (2017): 183-186
Or anything by Arvind Narayanan..